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A.  COMPUTATIONAL  STRUCTURAL  MECHANICS  ON 
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The  growing  interest  in  sophisticated  structures  such  as  next 
generation  composite  aircraft,  space  stations,  ceramics  engines  and  high 
performance  armors,  places  increasing  demands  on  the  optimality  of  the 
structural  components  used.  These  demands  are  met  by  improving  both  the 
accuracy  and  efficiency  of  the  computational  tools  used  to  develop  and 
analyze  these  structures.  The  computational  demands  of  these  structures  are 
so  great,  that  engineers  must  frequently  be  satisfied  with  rather  crude  linear 
analysis  even  though  the  large  deformations  involved  demand  nonlinear 
analysis  in  order  to  properly  represent  the  structural  behavior.  Moreover,  the 
realistic  simulation  of  the  nonlinear  dynamics  of  these  systems  remains 
beyond  the  feasible  range  of  the  current  vector  super-computers.  The  true 
potential  for  execution  improvement  lies  in  parallel  and  massively  parallel 
computing. 

Even  though  today  supercomputers  are  routinely  used  by  large 
industrial  corporations,  parallel  processors  are  thought  of  as  exotic  machines. 
Currently,  only  a  fraction  of  the  engineering  community  is  engaged  in 
applying  the  concurrent  computation  technology.  Incorporation  of  these  new 
machines  in  the  mainstream  of  large  scale  computations  is  more  challenging 
than  that  of  supercomputers,  and  will  have  a  greater  impact  on 
computational  engineering  power.  Moving  engineering  applications  to 
concurrent  processors  faces  significant  obstacles  that  will  have  to  be  resolved 
as  such  machines  are  becoming  available  on  a  commercial  scale.  These 
obstacles  center  on:  (a)  methods,  (b)  algorithms,  and  (c)  software.  The  present 
proposal  addresses  all  of  these  issues.  More  specifically,  the  objectives  of  this 
research  are:  (a)  to  investigate,  (b)  implement,  and  (c)  evaluate  tools  for 
concurrent  processing  of  very  large  structural  engineering  problems. 

The  word  tool  is  used  in  a  broad  sense.  It  includes  methods, 
algorithms,  and  software. 
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A. 2.1  Th«  FETI  Method. 


IS 


The  Finite  Element  Tearing  and  Interconnecting  (FETI)  method 
initially  developed  under  this  grant  is  a  practical  and  efficient  domain 
decomposition  (DD)  method  for  the  parallel  solution  of  self-adjoint  elliptic 
partial  differential  equations. 

A  given  spatial  domain  is  partitioned  into  non-overlapping 
subdomains  where  an  incomplete  solution  for  the  primary  field  is  first 
evaluated  using  a  direct  solver. 

Next,  intersubdomain  field  continuity  is  enforced  via  Lagrange 
multipliers  appdied  at  the  subdomain  interfaces.  This  "gluing"  phase 
generates  a  smaller  size  symmetric  dual  problem  where  the  unknowns  are 
the  Lagrange  multipliers,  and  which  is  best  solved  with  a  preconditioned 
conjugate  gradient  (PCG)  algorithm.  Each  iteration  of  the  PCG  algorithm 
requires  the  solution  of  independent  subdomain  problems.  For  static 
structural  analysis,  every  floating  subdomain  is  associated  with  a  singular 
stiffness  matrix  and  generates  a  set  of  interface  constraints.  Consequently,  the 
system  of  equations  governing  the  dual  interface  problem  is  in  general 
indefinite.  The  FETI  algorithm  deals  with  both  issues  by  incorporating  in  the 
solution  the  contribution  of  the  subdomain  rigid  body  modes  and  by  solving 
the  indefinite  interface  problem  with  a  preconditioned  conjugate  projected 
gradient  algorithm  (PCPG).  Each  projection  step  in  the  PCPG  algorithm  leads 
to  a  "natural"  coarse  problem.  We  have  shown  that  the  FETI  method  with 
the  Dirichlet  preconditioner  is  asymptotically  optimal  —  that  is,  the  condition 
number  of  the  preconditioned  dual  interface  problem  is  independent  of  the 
number  of  subdomains  and  grows  only  slowly  when  the  mesh  size  h 
approaches  0.  Therefore,  the  FETI  method  is  scalable  to  Massively  Parallel 
Processors  (MPP). 

We  have  applied  the  FETI  method  to  the  solution  of  three- 
dimensional  structural  problems  discretized  essentially  with  beam  and  shell 
elements.  For  such  problems,  we  have  shown  that  the  FETI  algorithm 
compares  favorably  with  leading  DD  methods  such  as  the  Neumann- 
Neumann  algorithm. 

We  have  also  shown  that  even  when  the  global  stiffness  matrix  can  be 
assembled  and  stored  in  real  memory,  the  FETI  method  often  outperforms 
optimized  direct  solvers  on  both  serial  and  parallel /vector  processors. 

The  FETI  method  has  been  implemented  on  the  iPSC-860,  the  KSR-1, 
and  the  CM-5.  It  has  been  further  extended  to  nonlinear  time-dependent 
analyses  and  to  eigenvalue  problems. 
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A.2.2  Mesh  Partitioning  Algorithms. 

Unstructured  meshes  are  used  in  several  large-scale  scientific  and 
engineering  problems,  including  finite-volume  methods  for  computational 
fluid  dynamics  and  finite  element  methods  for  structural  analysis.  Because  of 
their  large  size  and  computational  requirements  these  problems  are 
increasingly  solved  on  highly  parallel  machines  and  clusters  of  high-end 
workstations.  If  unstructured  problems  such  as  these  are  to  be  solved  on 
distributed-memory  parallel  computers,  their  data  structures  must  be 
partitioned  and  distributed  across  processors;  if  they  are  to  be  solved 
efficiently,  the  partitioning  process  must  maximize  load  balance  and 
minimize  interprocessor  communication.  Recent  investigations  have  also 
shown  that  even  when  computing  on  a  parallel  machine  that  offers  a  virtual 
shared  memory  environment  —  such  as  the  KSR-1  — ,  mesh  partitioning  is 
still  desirable  because  it  explicitly  enforces  data  locality  and  therefore  ensures 
high  levels  of  performance. 

The  development  of  efficient  heuristics  for  solving  the  NP  hard 
problem  of  graph  partitioning  has  been  a  very  active  research  area  in  the  last 
few  years.  Under  this  grant,  we  have  developed  and  implemented  a  number 
of  fast  algorithms  for  graph  and  mesh  partitioning  that  have  been 
demonstrated  to  be  useful  in  practical  large-scale  computational  science  and 
engineering  problems.  These  algorithms  are:  the  greedy  algorithm,  a 
bandwidth  minimization  based  algorithm,  a  recursive  version  of  the  latter 
algorithm,  principal  inertia  algorithms  and  their  recursive  versions,  a 
recursive  graph  bisection  algorithm  and  an  improved  implementation  of  the 
recursive  spectral  bisection  algorithm. 

A. 2.3  TOP/OOMDEC. 

TOP/DOMDEC  is  a  Totally  Object-oriented  Program  written  in  C++  and 
GL  for  automatic  DOMain  DEComposition. 

It  is  both  a  software  tool  and  a  software  environment  for  mesh 
partitioning  and  parallel  processing.  It  is  a  software  tool  because  it  contains 
the  algorithms  for  automatic  mesh  decomposition  mentioned  above  and  a  set 
of  relevant  decision  making  tools  for  selecting  the  best  mesh  partition  for  a 
given  problem  and  a  given  multiprocessor.  It  is  also  a  software  environment 
because  it  allows  advanced  users  to  "plug  in”  their  own  mesh  partitioning 
algorithm  and  benefit  from  all  the  interactive  features  of  TOP/DOMDEC  that 
include  the  evaluation  of  load  balancing,  network  traffic  and  communication 
costs,  the  generation  of  parallel  data  structures,  and  the  use  of  state-of-the-art 
high-speed  graphics.  The  TOP/DOMDEC  project  started  towards  the  end  of 
the  funding  period  of  this  Grant  but  benefited  from  the  funded  development 
of  the  mesh  partitioning  algorithms. 
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A.2.4  Evaluation  of  Parallal  Hardware  for  Unstructured  Meshes. 


We  have  used  the  algorithms  and  codes  we  have  developed  for 
parallel  computations  on  fully  unstructured  grids  to  highlight  the  impact  of 
three  MPP  architectures  —  the  CM-2,  the  iPSC-860,  and  the  KSR-1  —  on  the 
implementational  strategies.  On  the  KSR-1  system,  we  have  contrasted  two 
different  programming  approaches,  one  designed  for  fast  porting  and  the 
other  for  high  performance.  We  have  analyzed  performance  results  obtained 
on  all  three  MPP  systems  in  terms  of  interprocessor  communication  costs, 
scalability,  and  sheer  performance.  We  have  concluded  that  in  general,  for 
parallel  unstructured  finite  element  and  finite  volume  scientific 
computations,  a  64K  CM-200  machine  with  a  VP  ratio  =  1  delivers  600  real 
MFLOPS,  a  128  processor  iPSC-860  system  delivers  532  real  MFLOPS,  and  a  64 
processor  KSR-1  system  delivers  480  MFLOPS.  We  have  also  pointed  out  that 
the  KSR-1  parallel  processor  delivers  substantially  higher  performance  results 
when  programmed  with  the  local  memory  paradigm  than  with  the  virtual 
shared  memory  paradigm. 

.3  IMPACT  ON  COMPUTATIONAL  SCIENCE  INFRASTRUCTURE 
A. 3.1  The  FETI  Method. 

The  FETI  method  has  attracted  the  attention  of  several  researchers  in 
the  applied  mathematics  and  engineering  fields.  Professor  Jan  Mandel  at  the 
University  of  Colorado  at  Denver,  has  proved  the  optimal  scalability  of  this 
method  and  has  developed  a  dual  version  known  as  the  Balancing  Domain 
Decomposition  algorithm.  A  team  of  scientists  at  the  NASA  Langley  Research 
Center  directed  by  Dr.  Jerry  Housner  has  also  used  the  FETI  approach  to 
develop  a  procedure  for  the  coupled  analysis  of  independently  modeled 
substructures.  Professor  Roland  Keunings  at  the  University  of  Louvain, 
Belgium,  has  extended  its  range  of  applications  to  the  analysis  of  polymer 
flows. 

A.3.2  TOP/DOMDEC. 

The  TOP/DOMDEC  software  for  mesh  partitioning  and  parallel 
processing  is  currently  used  at  the  NASA  Ames  Research  Center,  the  NASA 
Lewis  Research  Center,  the  NASA  Langley  Research  Center,  the  Lockheed 
Palo  Alto  Mechanics  Laboratory,  the  Department  of  Electrical  Engineering  at 
the  University  of  Michigan,  the  Ford  Motors  Research  Laboratory,  INRIA  and 
ONERA  in  France,  and  the  University  of  Liege  and  the  University  of  Louvain 
in  Belgium. 
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The  advent  of  massively  parallel  processors  (MPP)  initiated  a  search  for 
appropriate  solution  algorithms  for  important  kernels  of  applications.  Since 
PDE  are  at  the  root  of  most  scientific  applications,  research  activity  in  parallel 
PDE  algorithms  has  been  especially  intense. 

There  are  several  needs  for  a  satisfactory  parallel  algorithm: 

Scalability: 

Scalability  refers  to  the  need  for  the  performance  of  an  algorithm  to 
increase  linearly  with  the  number  of  processors  used.  Care  is  needed  in  the 
definition  here  as  most  algorithms  are  not  scalable  in  the  strict  sense.  Given  a 
specific  sized  problem,  as  more  processors  are  used  the  work  is  subdivided 
into  smaller  pieces  which  are  usually  less  efficient  -  in  the  sense  that 
communication  effects  between  processors  become  increasingly  important 
relative  to  computation.  Thus  overall  performance  measured  as  say  Mflops 
will  follow  a  sublinear  curve  as  a  function  of  numbers  of  process 

In  most  application  areas  we  are  not  interested  in  solution  ot  a  fixed 
size  PDE  that  would  be  solvable  on  1  or  a  few  processors.  The  need  is  to  solve 
immense  problems  -  as  large  as  the  system  will  allow.  Thus  the  more 
appropriate  definition  of  scalability  is  that  as  the  number  of  processors  is 
increased  the  problem  size  is  scaled  correspondingly,  and  one  then  requires 
that  performance  scale  linearly  with  problem  size. 

Complexity: 

Even  on  a  single  processor,  the  time  required  to  solve  a  problem  may 
not  scale  linearly  with  problem  size.  Consider  a  PDE  which  has  been 
discretized  in  terms  of  N  degrees  of  freedom.  The  computational  complexity 
of  the  algorithm  refers  to  the  dependence  of  solution  time  on  N.  Because 
very  large  problems  of  the  type  contemplated  for  solution  on  MPP  systems 
will  have  N  large  and  proportional  to  the  number  of  processors  P,  algorithms 
whose  performance  does  not  scale  approximately  linearly  with  N  will  become 
prohibitively  expensive  with  problem  size. 


ft 
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The  requirement  for  a  good  numerical  solution  method  for  PDE  on 
MPP  systems  is  then  that  the  algorithm  have  computational  complexity  O(N’) 
and  be  scalable. 

Multigrid  Methods 

Among  the  known  scalable  algorithms  for  elliptic  PDE  solution,  the 
multigrid,  domain  decomposition  and  preconditioned  conjugate  gradient 
methods  play  a  prominent  role.  In  appropriate  situations  each  has 
complexity  close  to  linear  in  N  and  scales  linearly  in  P.  However  multigrid 
methods  have  a  particular  difficulty  on  parallel  machines  in  that  they  do  not 
have  uniform  amounts  of  work  to  be  performed  in  all  phases.  These 
algorithms  use  a  scale  of  finer  grids,  ranging  in  size  from  1  point  to  N  points. 
When  processing  very  coarse  grids,  it  is  impossible  to  effectively  use  P 
processors  -  for  example  if  P  is  greater  than  the  number  of  grid  points  at  that 
level. 


This  inherent  disadvantage  of  multigrid  methods  has  led  to  the  search 
for  truly  parallel  MG  methods  -  multigrid  methods  that  can  utilize  all 
processors  all  of  the  time.  Frederickson  and  McBryan  developed  the  PSMG 
algorithm  for  this  purpose.  With  a  fine  grid  of  size  N  it  can  utilize  up  to  N 
processors  all  of  the  time.  At  the  same  time  it  preserves  the  scalability  and 
linear  complexity  of  standard  multigrid. 

PSMG  research 

Under  the  current  AFOSR  grant  we  have  studied  many  aspects  of 
PSMG.  One  set  of  results  demonstrates  that  the  simplest  PSMG  algorithm  is 
more  efficient  on  N  processors  than  even  the  fastest  known  conventional 
MG  algorithms.  PSMG  methods  are  characterized  by  a  fast  convergence  rate  - 
they  have  a  small  convergence  factor  per  iteration.  While  convergence  rates 
are  often  used  to  compare  methods  they  are  not  a  suitable  measure  since  the 
work  per  iteration  is  not  considered.  Therefore  the  normalized  convergence 
rate  is  a  more  appropriate  factor  for  comparison  -  defined  as  the  amount  of 
work  required  to  reduce  the  error  by  one  decimal  point.  For  parallel 
algorithms  one  must  introduce  normalized  convergence  rates  for  both 
computation  and  communication  and  these  two  quantities  then  characterize 
algorithm  performance.  We  have  shown  that  PSMG  is  twice  as  efficient  in 
computation  and  five  times  as  efficient  in  communication  as  the  optimal  red- 
black  MG  algorithms  -  when  N  processors  are  available. 

We  have  extended  the  PSMG  algorithm  to  anisotropic  equations, 
showing  that  these  may  be  solved  as  easily  as  isotropic  systems.  The  analysis 
in  this  case  is  complicated  by  the  need  to  choose  a  semi-coarsening  scheme 
and  possibly  one-dimensional  solvers.  The  paper  provides  a  set  of  measured 
performance  data  for  the  resulting  algorithm. 
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B-2.  PERFORMANCE  EVALUATION  OF  MPP.  SYSTEMS 

Performance  evaluation  provides  tools  both  for  determining  absolute 
performance  of  MPP  systems  as  well  as  for  relative  comparison  of  such  l 

systems.  In  addition  to  simply  measuring  performance,  one  wants  in  practice 
to  understand  the  measured  performance.  Thus  performance  evaluation  has 
both  an  analytical  and  an  observational  aspect.  We  have  pursued  both  aspects 
in  our  AFOSR  research. 

I 

Performance  evaluation  is  inherently  a  characteristic  of  the  systems  that  are 
measured.  We  have  emphasized  as  wide  a  range  as  possible  of  MPP  systems  - 
SIMD,  MIMD,  shared  memory  and  distributed  memory.  Systems  studied 
during  this  research  and  reported  in  publications  include: 


System 

Number 

Peak 

Nodes 

M  flops 

Meiko  CS-1 

16 

960 

SUPRENUM-1 

256 

5,120 

Intel  iPSC2 

Intel  iPSC/860 

128 

7,680 

Thinking  Machines  Corp.  CM-2 

65536 

24,000 

Thinking  Machines  Corp.  CM-200 

65536 

32,000 

Thinking  Machines  Corp.  CM-5 

1024 

131,000 

Kendall  Square  KSR-1 

64 

2560 

Evans  and  Sutherland  ES-1 

32 

640 

Myrias  SP-2 

128 

64 

Each  of  these  systems  has  special  characteristics  that  need  to  be  understood  for 

a  proper  evaluation.  In  fact  designing  appropriate  performance  measures  is  I 

non-trivial. 

We  have  adopted  a  multi-level  approach  to  performance  analysis.  At 
the  lowest  level  we  have  studied  the  behavior  of  systems  in  simple 
arithmetic  and  communication  tasks  For  example,  measuring  the  cost  of  a 
parallel  vector  multiply  or  of  a  point  to  point  data  exchange.  Such 
experiments  are  useful  in  understanding  what  operations  actually  approach 
the  manufacturers  rated  peak  performance,  but  say  little  about  the  behavior  of 
real  algorithms.  At  the  second  level  we  have  studied  the  behavior  of 
significant  kernels  -  such  as  PDE  solvers.  These  are  components  of  full 
applications,  but  yet  simple  enough  to  be  analyzed  fairly  completely.  Typical  * 

examples  that  we  have  used  include  relaxation,  multigrid  and  multi¬ 
dimensional  FFT. 

Our  study  focused  on  four  computing  systems,  each  representing  the 
major  parallel  architecture  classes:  a  distributed  memory  SIMD  system  (the  ► 
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Connection  Machine  CM-2),  a  shared  memory  MIMD  system  (Evans  & 
Sutherland  ES-1),  a  shared /distributed  memory  hybrid  system  (Myrias  SPS-2), 
and  a  distributed  MIMD  system  (Intel  iPSC/860).  Using  an  integrated 
multilevel  approach  we  can  accurately  describe  the  behavior  of  matrix /vector 
operations,  relaxation  techniques,  and  PDE  applications  on  these 
architectures. 

To  demonstrate  the  usefulness  of  these  models,  we  utilized  a  concrete 
example  of  a  2D  fluid  code  describing  atmospheric  and  oceanic  systems  (the 
Shallow  Water  Equations)  and  show  how  such  models  can  predict  application 
performance  to  within  ten  percent  error. 

We  presented  performance  models  of  numerical  computations  on  the 
Connection  Machine  CM-2,  a  massively  parallel  distributed  memory 
processor  array  containing  65,536  processor  in  a  hypercube  topology.  Our 
research  identifies  communication  and  computation  characteristics  that 
predict  the  performance  of  matrix /vector  operations,  conjugate  gradient 
methods,  relaxation  techniques,  and  PDE  applications  on  this  architecture. 
Utilizing  a  concrete  example  of  a  2D  fluid  code  describing  atmospheric  and 
ocean  systems,  we  demonstrated  how  such  models  can  be  used  to  predict 
application  performance  to  within  five  percent  error. 

We  studied  the  effectiveness  of  the  parallel  microarchitecture 
employed  in  the  Intel  i860  RISC  microprocessor  at  performing  matrix/vector 
computation  kernels  of  scientific  applications.  The  key  issue  we  examined  is 
how  the  data  access  patterns  dictate  system  performance.  We  studied  blocked 
algorithms  and  data  mapping  techniques  that  improve  data  locality  of 
multidimensional  data  structures.  We  demonstrate  that  the  cache  miss  ratio, 
bus  utilization  and  external  memory  access  optimization  are  the  main  factors 
driving  the  performance  of  the  system. 
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AFOSR  support  has  been  critical  in  allowing  the  Center  for  Applied 
Parallel  Processing  (CAPP)  at  University  of  Colorado  to  acquire  and  operate  a 
number  of  advanced  architectures.  Beginning  with  the  acquisition  of  an 
Evans  and  Sutherland  ES-1  in  1989,  we  followed  with  a  Myrias  SPS-1  in  1990 
and  a  Kendall  Square  KSR-1  in  1992.  These  have  been  the  ground-breaking 
architectures  for  the  introduction  of  massively  parallel  shared  memory 
approaches.  We  have  pursued  these  developments  because  we  believe  that 
message  passing  approaches  are  so  difficult  from  a  use  standpoint  that  they 
will  never  be  widely  adopted.  By  contrast,  virtual  shared  memory  approaches 
as  in  the  Myrias  SPS-1  and  Kendall  square  KSR-1  offer  the  possibility  of 
hiding  all  of  the  message  passing  protocol  fro  users.  Unlike  true  shared 
memory  systems,  these  systems  are  also  inherently  scalable. 


» 

AFOSR  support  was  used  to  fund  graduate  research  associates  who 
become  expert  in  the  use  of  these  systems.  These  students  then  provided 
consulting  help  to  other  potential  users  of  the  systems. 

» 

A  second  aspect  of  hardware  integration  was  the  development  of 
prototype  heterogeneous  computing  environments.  A  Stardent  TITAN 
multiprocessor  graphics  system  was  acquired  and  was  successfully  interfaced 
to  the  backend  of  a  TMC  CM-2.  Most  of  the  software  development  involved 
was  done  by  AFOSR  supported  graduate  students.  Thinking  Machines  l 

Corporation  was  also  extremely  helpful  in  supplying  software  support.  The 
resulting  software  was  adapted  by  TMC  as  a  commercial  product  and  is 
currently  running  at  many  sites  world-wide.  The  underlying  concept  here 
was  to  provide  the  ease  of  use  of  a  Stardent  visualization  system  to  a 
Connection  Machine  user.  Rather  than  implement  Stardent  AVS  software 
on  the  CM-2,  we 'provided  an  interface  to  CM-2  data  through  the  high-speed 
CM-2  I/O  ports,  allowing  AVS  programs  to  be  written  that  effectively 
displayed  CM-2  computations  in  real  time. 
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C.  OTHER  APPLICATIONS  ON  PARALLEL  MACHINES 


C.1  COMPRESSIBLE  CONVECTION  AND  PULSATIONS  ON 
MASSIVELY  PARALLEL  COMPUTERS 

Philip  W.  Jones 


In  red  giant  stars,  convection  is  the  dominant  heat  transport 
mechanism.  This  convection  is  vigorous  and  may  become  supersonic.  Red 
giant  stars  also  pulsate  globally  in  the  fundamental  or  first  harmonic  acoustic 
mode.  Because  the  motions  are  very  vigorous  and  the  timescales  for  the  two 
types  of  motion  are  similar,  acoustic  waves  are  likely  to  interact  strongly  with 
the  convection.  We  have  completed  a  series  of  simulations  on  the 
Connection  Machine  CM-2  which  examine  the  interaction  and  excitation  of 
acoustic  waves  with  fully  compressible  fluid  convection. 

We  have  developed  Connection  Machine  simulations  of  vigorous 
convection  in  two-dimensional  polytropic  models  in  which  a  layer  unstable 
to  convection  is  situated  between  two  stably-stratified  layers.  The  fully 
compressible  fluid  equations  are  advanced  in  time  from  this  initial  state 
using  an  explicit  second-order  Adams- Bashforth  scheme  for  most  terms,  but 
an  implicit  Crank-Nicholson  scheme  is  used  for  the  temperature  diffusion  to 
avoid  excessive  restrictions  on  the  time  step.  All  spatial  derivatives  are 
computed  using  second-order  finite  differences.  These  codes  are  quite 
efficient  on  the  CM-2,  achieving  scaled  performances  of  2.4  Gflops. 

One  series  of  simulations  examined  the  interaction  of  convection  with 
acoustic  waves  which  were  artificially  driven  at  the  lower  boundary.  The 
change  in  energy  of  the  wave  was  analyzed  by  computing  the  work  done  in 
the  frame  of  the  wave  by  gas  and  turbulent  pressures  as  the  wave  passed 
through  the  layer  of  fluid  undergoing  vigorous  time-dependent  convection. 
We  found  that  the  driving  of  the  waves  increased  as  a  function  of  wave 
frequency  and  that  turbulent  pressure  driving  was  comparable  and  even 
exceeded  that  due  to  gas  pressure  driving.  Analysis  of  our  results  using  a 
simple  two-stream  model  of  the  convection  indicates  that  wave  focusing  is 
responsible  for  turbulent  pressure  damping  of  the  acoustic  wave  while 
driving  of  the  waves  is  caused  by  a  forced  modulation  due  to  the  acoustic 
wave,  especially  where  the  wave  amplitude  is  large.  Red  giant  pulsations  are 
large  amplitude  acoustic  waves  so  the  latter  effects  probably  dominate  in  these 
stars.  We  also  found  that  time-dependent  convection  provided  a  stochastic 
component  of  driving  which  may  be  responsible  for  irregular  variability  in 
some  pulsating  red  giant  stars. 
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In  a  second  simulation,  we  examined  supersonic  penetrative 
convection.  The  convection  in  this  case  reached  a  Mach  number  of  2.4  in  the 
upper  boundary  layer  and  exhibited  a  unique  time-dependence,  possibly  due 
to  the  limited  aspect  ratio  of  our  domain.  This  system  was  found  to  excite 
large  amplitude  acoustic  waves.  Such  excitation  has  been  predicted  by  simple 
models  of  isotropic  turbulence  and  acoustic  emissivity  was  predicted  to 
increase  with  the  cube  of  the  Mach  number.  However  this  is  the  first  time 
self-excited  acoustic  waves  with  large  amplitudes  have  been  observed  in 
simulations  of  compressible  convection.  Based  on  our  previous  results, 
continued  driving  of  this  wave  should  be  expected  due  to  modulation  of  the 
convection  by  the  large  amplitude  wave.  However,  our  analysis  is  confused 
due  to  an  unexpectedly  strong  horizontal  mean  component  of  the  convective 
time  dependence.  We  are  currently  starting  another  simulation  with  a  larger 
aspect  ratio  to  isolate  aspect  ratio  effects  from  the  wave-convection 
interactions. 

All  of  the  work  described  above  was  completed  as  part  of  a  Ph.D.  thesis 
which  was  successfully  defended  on  23  August  1991.  At  least  two  additional 
papers  will  be  appearing  soon  in  the  Astrophysicai  Journal. 
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C.2.1  Spin  Models 

Spin  models  were  invented  as  simple  statistical  mechanical  models  of 
ferromagnetism.  In  most  cases  they  exhibit  the  cooperative  behavior  found  in 
phase  transitions,  which  arises  from  the  development  of  long  range  order  in 
the  system.  The  simplest  example  of  such  a  model,  in  which  the  magnetic 
moments  are  assumed  to  be  classical  one-dimensional  spins  capable  of  only 
two  orientations,  is  the  so-called  Ising  model.  The  simplest  generalization  of 
this  discrete  spin  model  is  obtained  by  allowing  the  spins  to  point  in  q 
directions  -  the  q-state  Potts  model. 
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Another  generalization  one  can  make  is  to  have  continuous  rather 
than  discrete  spin  variables.  Then  the  spins  are  represented  as  unit  vectors  in 
N-dimensions  giving  rise  to  the  O(N)  models.  N  =  1  gives  the  Ising  model 
again;  N  =  2  is  the  XY  model,  also  known  as  the  planar  Heisenberg  or  planar 
rotator  model;  and  N  =  3  is  the  Heisenberg  model. 

In  order  to  investigate  the  behavior  of  spin  models  near  their  phase 
transitions,  Monte  Carlo  algc^thms  are  traditionally  used.  Unfortunately  the 
simplest  algorithm  —  the  Metropolis  algorithm  --  suffers  from  the  problem  of 
critical  slowing  down,  which  dramatically  reduces  its  efficiency.  Therefore  we 
use  instead  the  new  over-relaxed  and  cluster  algorithms  which  help  to 
alleviate  this  problem. 

We  have  investigated  the  Ising,  q=2  Potts,  XY  and  0(3)  models  in  two 
dimensions  and  the  Ising  model  in  three  dimensions;  using  various  cluster 
algorithms. 

C.2.2  Dynamically  Triangulated  Random  Surfaces 

Dynamically  triangulated  random  surfaces  provide  convenient 
discretizations  of  Strings  that  can  be  simulated  numerically  using  Monte 
Carlo  techniques.  As  a  point  particle  in  space  moves  through  time  it  traces 
out  a  line;  similarly  as  the  string,  which  looks  like  a  line  in  space,  moves 
through  time  it  sweeps  out  a  two-dimensional  surface  called  the  worldsheet. 
Thus  there  are  two  ways  in  which  to  discretize  the  string:  either  the 
worldsheet  is  discretized  or  the  (d-dimensional)  space-time  in  which  the 
string  is  embedded  is  discretized;  we  consider  the  former.  Such  discretized 
surface  models  fall  into  three  categories:  regular  surfaces,  fixed  random 
surfaces  and  dynamical  random  surfaces.  In  the  first,  the  surface  is  composed 
of  plaquettes  in  a  d-dimensional  regular  hypercubic  lattice;  in  the  second,  the 
surface  is  randomly  triangulated  once  at  the  beginning  of  the  simulation;  and 
in  the  third  the  random  triangulation  becomes  dynamical  (i.e.  is  changed 
during  the  simulation). 

It  is  these  dynamically  triangulated  random  surfaces  we  wish  to 
simulate.  Such  a  simulation  is,  in  effect,  that  of  a  fluid  surface.  This  is 
because  the  varying  triangulation  means  that  there  is  no  fixed  reference 
frame,  which  is  precisely  what  one  would  expect  of  a  fluid  where  the 
molecules  at  two  different  points  could  interchange  and  still  leave  the  surface 
intact.  In  string  theory,  this  is  called  reparametrization  invariance. 
Unfortunately  the  straight-forward  simulation  of  such  a  surface  yields  rather 
disappointing  results,  in  that  the  surface  appears  to  be  in  a  very  crumpled 
state.  The  reason  for  this  is  that  spike-like  configurations  in  the  surface  are 
not  suppressed,  allowing  it  to  degenerate  into  a  spiky,  crumpled  object.  To 
overcome  this  difficulty,  one  adds  extrinsic  curvature  to  smooth  out  the 


surface.  Thus  we  simulate  dynamically  triangulated  random  surfaces  with 
extrinsic  curvature. 

An  alternative  way  to  cure  the  pathological  nature  of  the  surfaces  is  to 
put  spin  models  on  them.  The  result  of  this  is  in  fact  a  simulation  of  matter 
coupled  to  quantum  gravity  in  two  dimensions.  We  have  investigated  the 
simplest  case,  namely  the  quenched  Ising  model. 
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